-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[InstCombine] When canonicalizing clamp like, also consider certain sgt/slt cases #153240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…cmp)) This ([s/z]ext (icmp)) is equivilant to (select icmp, [1/-1], 0), so can be canonicalized too. This is generally not beneficial because after the canonicalization we lose the ability replace one of the selects with ([s/z]ext (icmp)). However, it is beneficial for this particular case: ``` %old_cmp1 = icmp sgt i32 %x, C2 %old_replacement = sext i1 %old_cmp1 to i32 %old_cmp0 = icmp ult i32 %x, C0 %r = select i1 %old_cmp0, i32 %x, i32 %old_replacement it can be rewriten as more canonical pattern: %new_cmp2 = icmp sge i32 %x, C0 %new_clamped_low = smax i32 %target_low, i32 %x %r = select i1 %new_cmp2, i32 -1, i32 %new_clamped_low Iff 0 s<= C2 s<= C0 ``` The select can be lowered to: ``` %sext_cmp2 = sext i1 %new_cmp2 to i32 %r = or i32 %sext_cmp2, i32 %new_clamped_low ```
…sgt/slt cases In particular, when %target_low=0 and %target_high=-1 and C1=0 ``` %old_cmp1 = icmp slt %x, C2 %old_replacement = select %old_cmp1, %target_low, %target_high ``` might have aleady been combined into ``` %old_cmp1 = icmp sgt %x, C2 %old_replacement = sext %old_cmp1 ``` For this particular case, the canonacalization allows for a more optimized sequence utilizing `max` to be created. ``` %old_cmp1 = icmp sgt %x, C2 %old_replacement = sext %old_cmp1 %old_cmp0 = icmp ult i32 %x, C0 %r = select i1 %old_cmp0, i32 %x, i32 %old_replacement ``` If 0 s<= C2 s<= C0, can be re-written as: ``` %new_cmp1 = icmp slt i32 %x, 0 %new_cmp2 = icmp sge i32 %x, C0 %new_clamped_low = select i1 %new_cmp1, i32 0, i32 %x %r = select i1 %new_cmp2, i32 -1, i32 %new_clamped_low ``` Can be re-written as (already occurs from the canonicalized version): ``` %clamped_low = max i32 %x, 0 %new_cmp2 = icmp sge i32 %x, C0 %sext = sext i1 %new_cmp2 %r = or i32 %sext, %new_cmp2 ```
|
@llvm/pr-subscribers-llvm-transforms Author: Ryan Buchner (bababuck) ChangesIn particular, when %target_low=0 and %target_high=-1 and C1=0 might have already been combined into which prevents clamp like select-select folding from occuring if this is part of a select-select sequence. So, for this particular case, if 0 s<= C2 s<= C0: can be re-written as (this re-write is what this commit enables) Which can then be re-written as the more efficient (already occurs from the canonicalized version): Full diff: https://github.com/llvm/llvm-project/pull/153240.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index eb4332fbc0959..4471e0ebfe025 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -1521,10 +1521,17 @@ static Value *canonicalizeClampLike(SelectInst &Sel0, ICmpInst &Cmp0,
m_CombineAnd(m_AnyIntegralConstant(), m_Constant(C0))))
return nullptr;
- if (!isa<SelectInst>(Sel1)) {
- Pred0 = ICmpInst::getInversePredicate(Pred0);
- std::swap(X, Sel1);
- }
+ auto SwapSelectOperands = [](ICmpInst::Predicate &Pred, Value *&Op0,
+ Value *&Op1) -> void {
+ std::swap(Op0, Op1);
+ Pred = ICmpInst::getInversePredicate(Pred);
+ };
+
+ if (!isa<SelectInst>(Sel1))
+ SwapSelectOperands(Pred0, Sel1, X);
+
+ if (!isa<SelectInst>(Sel1) && !isa<SExtInst>(Sel1))
+ SwapSelectOperands(Pred0, Sel1, X);
// Canonicalize Cmp0 into ult or uge.
// FIXME: we shouldn't care about lanes that are 'undef' in the end?
@@ -1575,17 +1582,30 @@ static Value *canonicalizeClampLike(SelectInst &Sel0, ICmpInst &Cmp0,
m_CombineAnd(m_AnyIntegralConstant(), m_Constant(C1)))))
return nullptr;
+ // Will create Replacement[Low/High] later for SExtICmp case
Value *Cmp1;
CmpPredicate Pred1;
Constant *C2;
Value *ReplacementLow, *ReplacementHigh;
- if (!match(Sel1, m_Select(m_Value(Cmp1), m_Value(ReplacementLow),
- m_Value(ReplacementHigh))) ||
+ bool FoldSExtICmp;
+ auto MatchSExtICmp = [](Value *PossibleSextIcmp, Value *&Cmp1) -> bool {
+ Value *ICmpOp0, *ICmpOp1;
+ return match(PossibleSextIcmp, m_SExt(m_Value(Cmp1))) &&
+ match(Cmp1, m_ICmp(m_Value(ICmpOp0), m_Value(ICmpOp1)));
+ };
+ if (!((FoldSExtICmp = MatchSExtICmp(Sel1, Cmp1)) ||
+ match(Sel1, m_Select(m_Value(Cmp1), m_Value(ReplacementLow),
+ m_Value(ReplacementHigh)))) ||
!match(Cmp1,
m_ICmp(Pred1, m_Specific(X),
m_CombineAnd(m_AnyIntegralConstant(), m_Constant(C2)))))
return nullptr;
+ // When folding sext-icmp, only efficient if C1 = 0 so we can make use of the
+ // `smax` instruction
+ if (FoldSExtICmp && !C1->isZeroValue())
+ return nullptr;
+
if (!Cmp1->hasOneUse() && (Cmp00 == X || !Cmp00->hasOneUse()))
return nullptr; // Not enough one-use instructions for the fold.
// FIXME: this restriction could be relaxed if Cmp1 can be reused as one of
@@ -1593,8 +1613,13 @@ static Value *canonicalizeClampLike(SelectInst &Sel0, ICmpInst &Cmp0,
// Canonicalize Cmp1 into the form we expect.
// FIXME: we shouldn't care about lanes that are 'undef' in the end?
+ bool SwapReplacement = false;
switch (Pred1) {
case ICmpInst::Predicate::ICMP_SLT:
+ // The sext(icmp) case only is advantageous for SGT/SGTE since that enables
+ // max conversion
+ if (FoldSExtICmp)
+ return nullptr;
break;
case ICmpInst::Predicate::ICMP_SLE:
// We'd have to increment C2 by one, and for that it must not have signed
@@ -1615,7 +1640,7 @@ static Value *canonicalizeClampLike(SelectInst &Sel0, ICmpInst &Cmp0,
// Also non-canonical, but here we don't need to change C2,
// so we don't have any restrictions on C2, so we can just handle it.
Pred1 = ICmpInst::Predicate::ICMP_SLT;
- std::swap(ReplacementLow, ReplacementHigh);
+ SwapReplacement = true;
break;
default:
return nullptr; // Unknown predicate.
@@ -1644,6 +1669,14 @@ static Value *canonicalizeClampLike(SelectInst &Sel0, ICmpInst &Cmp0,
if (!Precond2 || !match(Precond2, m_One()))
return nullptr;
+ if (FoldSExtICmp) {
+ ReplacementLow = Constant::getAllOnesValue(Sel1->getType());
+ ReplacementHigh = Constant::getNullValue(Sel1->getType());
+ }
+
+ if (SwapReplacement)
+ std::swap(ReplacementLow, ReplacementHigh);
+
// If we are matching from a truncated input, we need to sext the
// ReplacementLow and ReplacementHigh values. Only do the transform if they
// are free to extend due to being constants.
diff --git a/llvm/test/Transforms/InstCombine/canonicalize-clamp-like-pattern-i1.ll b/llvm/test/Transforms/InstCombine/canonicalize-clamp-like-pattern-i1.ll
new file mode 100644
index 0000000000000..062c7b477a44b
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/canonicalize-clamp-like-pattern-i1.ll
@@ -0,0 +1,159 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes='instcombine<no-verify-fixpoint>' -S | FileCheck %s
+
+; Given a pattern like:
+; %old_cmp1 = icmp sgt i32 %x, C2
+; %old_replacement = sext i1 %old_cmp1 to i32
+; %old_cmp0 = icmp ult i32 %x, C0
+; %r = select i1 %old_cmp0, i32 %x, i32 %old_replacement
+; it can be rewriten as more canonical pattern:
+; %new_cmp2 = icmp sge i32 %x, C0
+; %new_clamped_low = smax i32 %target_low, i32 %x
+; %r = select i1 %new_cmp2, i32 -1, i32 %new_clamped_low
+; Iff 0 s<= C2 s<= C0
+; Also, ULT predicate can also be UGE; or UGT iff C0 != -1 (+invert result)
+; Also, SLT predicate can also be SGE; or SGT iff C2 != INT_MAX (+invert res.)
+
+;-------------------------------------------------------------------------------
+
+; clamp-like max case, can be optimized with max
+define i32 @clamp_max_sgt(i32 %x) {
+; CHECK-LABEL: @clamp_max_sgt(
+; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 [[X:%.*]], 255
+; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.smax.i32(i32 [[X]], i32 0)
+; CHECK-NEXT: [[COND3:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[TMP2]]
+; CHECK-NEXT: ret i32 [[COND3]]
+;
+ %or.cond = icmp ult i32 %x, 256
+ %cmp2 = icmp sgt i32 %x, 0
+ %cond = sext i1 %cmp2 to i32
+ %cond3 = select i1 %or.cond, i32 %x, i32 %cond
+ ret i32 %cond3
+}
+
+; clamp-like max case with vector, can be optimized with max
+define <2 x i32> @clamp_max_sgt_vec(<2 x i32> %x) {
+; CHECK-LABEL: @clamp_max_sgt_vec(
+; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <2 x i32> [[X:%.*]], <i32 99, i32 255>
+; CHECK-NEXT: [[TMP2:%.*]] = call <2 x i32> @llvm.smax.v2i32(<2 x i32> [[X]], <2 x i32> zeroinitializer)
+; CHECK-NEXT: [[COND3:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> splat (i32 -1), <2 x i32> [[TMP2]]
+; CHECK-NEXT: ret <2 x i32> [[COND3]]
+;
+ %or.cond = icmp ult <2 x i32> %x, <i32 100, i32 256>
+ %cmp2 = icmp sgt <2 x i32> %x, <i32 98, i32 254>
+ %cond = sext <2 x i1> %cmp2 to <2 x i32>
+ %cond3 = select <2 x i1> %or.cond, <2 x i32> %x, <2 x i32> %cond
+ ret <2 x i32> %cond3
+}
+
+; Not clamp-like vector
+define <2 x i32> @clamp_max_vec(<2 x i32> %x) {
+; CHECK-LABEL: @clamp_max_vec(
+; CHECK-NEXT: [[OR_COND:%.*]] = icmp ult <2 x i32> [[X:%.*]], <i32 100, i32 256>
+; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt <2 x i32> [[X]], <i32 128, i32 0>
+; CHECK-NEXT: [[COND:%.*]] = sext <2 x i1> [[CMP2]] to <2 x i32>
+; CHECK-NEXT: [[COND3:%.*]] = select <2 x i1> [[OR_COND]], <2 x i32> [[X]], <2 x i32> [[COND]]
+; CHECK-NEXT: ret <2 x i32> [[COND3]]
+;
+ %or.cond = icmp ult <2 x i32> %x, <i32 100, i32 256>
+ %cmp2 = icmp sgt <2 x i32> %x, <i32 128, i32 0>
+ %cond = sext <2 x i1> %cmp2 to <2 x i32>
+ %cond3 = select <2 x i1> %or.cond, <2 x i32> %x, <2 x i32> %cond
+ ret <2 x i32> %cond3
+}
+
+; clamp-like max case, can be optimized with max
+define i32 @clamp_max_sgt_neg1(i32 %x) {
+; CHECK-LABEL: @clamp_max_sgt_neg1(
+; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 [[X:%.*]], 255
+; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.smax.i32(i32 [[X]], i32 0)
+; CHECK-NEXT: [[COND3:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[TMP2]]
+; CHECK-NEXT: ret i32 [[COND3]]
+;
+ %or.cond = icmp ult i32 %x, 256
+ %cmp2 = icmp sgt i32 %x, -1
+ %cond = sext i1 %cmp2 to i32
+ %cond3 = select i1 %or.cond, i32 %x, i32 %cond
+ ret i32 %cond3
+}
+
+; clamp-like max case, can be optimized with max
+define i32 @clamp_max_sge(i32 %x) {
+; CHECK-LABEL: @clamp_max_sge(
+; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 [[X:%.*]], 255
+; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.smax.i32(i32 [[X]], i32 0)
+; CHECK-NEXT: [[COND3:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[TMP2]]
+; CHECK-NEXT: ret i32 [[COND3]]
+;
+ %or.cond = icmp ult i32 %x, 256
+ %cmp2 = icmp sge i32 %x, 0
+ %cond = sext i1 %cmp2 to i32
+ %cond3 = select i1 %or.cond, i32 %x, i32 %cond
+ ret i32 %cond3
+}
+
+; Don't support SLT cases, need to select 0 as the low value, -1 as high value
+define i32 @clamp_max_slt(i32 %x) {
+; CHECK-LABEL: @clamp_max_slt(
+; CHECK-NEXT: [[OR_COND:%.*]] = icmp ult i32 [[X:%.*]], 256
+; CHECK-NEXT: [[COND:%.*]] = ashr i32 [[X]], 31
+; CHECK-NEXT: [[COND3:%.*]] = select i1 [[OR_COND]], i32 [[X]], i32 [[COND]]
+; CHECK-NEXT: ret i32 [[COND3]]
+;
+ %or.cond = icmp ult i32 %x, 256
+ %cmp2 = icmp slt i32 %x, 0
+ %cond = sext i1 %cmp2 to i32
+ %cond3 = select i1 %or.cond, i32 %x, i32 %cond
+ ret i32 %cond3
+}
+
+; Don't support SLE cases, need to select 0 as the low value, -1 as high value
+define i32 @clamp_max_sle(i32 %x) {
+; CHECK-LABEL: @clamp_max_sle(
+; CHECK-NEXT: [[OR_COND:%.*]] = icmp ult i32 [[X:%.*]], 256
+; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[X]], 1
+; CHECK-NEXT: [[COND:%.*]] = sext i1 [[CMP2]] to i32
+; CHECK-NEXT: [[COND3:%.*]] = select i1 [[OR_COND]], i32 [[X]], i32 [[COND]]
+; CHECK-NEXT: ret i32 [[COND3]]
+;
+ %or.cond = icmp ult i32 %x, 256
+ %cmp2 = icmp sle i32 %x, 0
+ %cond = sext i1 %cmp2 to i32
+ %cond3 = select i1 %or.cond, i32 %x, i32 %cond
+ ret i32 %cond3
+}
+
+; Not selecting between 0, x, and -1, so can't be optimized with max
+; Select between 0, x, and 1
+define i32 @clamp_max_bad_values(i32 %x) {
+; CHECK-LABEL: @clamp_max_bad_values(
+; CHECK-NEXT: [[OR_COND:%.*]] = icmp ult i32 [[X:%.*]], 256
+; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[X]], 0
+; CHECK-NEXT: [[COND:%.*]] = zext i1 [[CMP2]] to i32
+; CHECK-NEXT: [[COND3:%.*]] = select i1 [[OR_COND]], i32 [[X]], i32 [[COND]]
+; CHECK-NEXT: ret i32 [[COND3]]
+;
+ %or.cond = icmp ult i32 %x, 256
+ %cmp2 = icmp sgt i32 %x, 0
+ %cond = zext i1 %cmp2 to i32
+ %cond3 = select i1 %or.cond, i32 %x, i32 %cond
+ ret i32 %cond3
+}
+
+; Boundaries of range are not 0 and x (x is some positive integer)
+define i32 @clamp_max_offset(i32 %x) {
+; CHECK-LABEL: @clamp_max_offset(
+; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[X:%.*]], -10
+; CHECK-NEXT: [[OR_COND:%.*]] = icmp ult i32 [[TMP1]], 246
+; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[X]], 10
+; CHECK-NEXT: [[COND:%.*]] = sext i1 [[CMP2]] to i32
+; CHECK-NEXT: [[COND3:%.*]] = select i1 [[OR_COND]], i32 [[X]], i32 [[COND]]
+; CHECK-NEXT: ret i32 [[COND3]]
+;
+ %1 = add i32 %x, -10
+ %or.cond = icmp ult i32 %1, 246
+ %cmp2 = icmp sgt i32 %x, 10
+ %cond = sext i1 %cmp2 to i32
+ %cond3 = select i1 %or.cond, i32 %x, i32 %cond
+ ret i32 %cond3
+}
|
|
I am fine with the IR change. But we may need to handle the new usat pattern on AArch64: https://godbolt.org/z/nq8P6Yssx |
I don't see how |
This pattern comes from IR changes to ffmpeg: dtcxzyw/llvm-opt-benchmark#2670 |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
@dtcxzyw I am thinking that this should get merged and the |
|
cc @davemgreen Do you think this PR should be blocked until the regression on ARM is fixed? If not, we just file an issue to track the problem. |
Co-authored-by: Nikita Popov <[email protected]>
Can just intialize high/low for foldsext case already swapped.
In particular, when %target_low=0 and %target_high=-1 and C1=0
might have already been combined into
which prevents clamp like select-select folding from occuring if this is part of a select-select sequence.
So, for this particular case, if 0 s<= C2 s<= C0:
can be re-written as (this re-write is what this commit enables)
Which can then be re-written as the more efficient (already occurs from the canonicalized version):
Proofs:
https://alive2.llvm.org/ce/z/9BWdCc
https://alive2.llvm.org/ce/z/xSr9E5